Goto

Collaborating Authors

 classification and regression model


Machine learning method for return direction forecasting of Exchange Traded Funds using classification and regression models

arXiv.org Machine Learning

This article aims to propose and apply a machine learning method to analyze the direction of returns from Exchange Traded Funds (ETFs) using the historical return data of its components, helping to make investment strategy decisions through a trading algorithm. In methodological terms, regression and classification models were applied, using standard datasets from Brazilian and American markets, in addition to algorithmic error metrics. In terms of research results, they were analyzed and compared to those of the Na\"ive forecast and the returns obtained by the buy & hold technique in the same period of time. In terms of risk and return, the models mostly performed better than the control metrics, with emphasis on the linear regression model and the classification models by logistic regression, support vector machine (using the LinearSVC model), Gaussian Naive Bayes and K-Nearest Neighbors, where in certain datasets the returns exceeded by two times and the Sharpe ratio by up to four times those of the buy & hold control model.


8 databases supporting in-database machine learning

#artificialintelligence

In my August 2020 article, "How to choose a cloud machine learning platform," my first guideline for choosing a platform was, "Be close to your data." Keeping the code near the data is necessary to keep the latency low, since the speed of light limits transmission speeds. After all, machine learning -- especially deep learning -- tends to go through all your data multiple times (each time through is called an epoch). I said at the time that the ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent.


Tensorflow 2.0: Solving Classification and Regression Problems

#artificialintelligence

By the end of the 50th epoch, we have training accuracy of 100% while validation accuracy of 98.56%, which is impressive. Let's finally evaluate the performance of our classification model on the test set: Our model achieves an accuracy of 97.39% on the test set. Though it is slightly less than the training accuracy of 100%, it is still very good given the fact that we randomly chose the number of layers and the nodes. You can add more layers to the model with more nodes and see if you can get better results on the validation and test sets. In regression problem, the goal is to predict a continuous value. In this section, you will see how to solve a regression problem with TensorFlow 2.0 The dataset for this problem can be downloaded freely from this link.


Advanced data exploration and modeling with Spark

#artificialintelligence

This walkthrough uses HDInsight Spark to do data exploration and train binary classification and regression models using cross-validation and hyperparameter optimization on a sample of the NYC taxi trip and fare 2013 dataset. It walks you through the steps of the Data Science Process, end-to-end, using an HDInsight Spark cluster for processing and Azure blobs to store the data and the models. The process explores and visualizes data brought in from an Azure Storage Blob and then prepares the data to build predictive models. Python has been used to code the solution and to show the relevant plots. These models are build using the Spark MLlib toolkit to do binary classification and regression modeling tasks.